Splitting of Compound Terms in non-Prototypical Compounding Languages
نویسندگان
چکیده
Compounding is present in a large variety of languages in different proportions. Compound rate in the text obviously depends on the language, but also on the genre and the domain. Scientific and technical texts are especially conducive to compounding, even in the languages that are not traditionally admitted as highly compounding ones. In this article we address compound splitting of specialized terms. We propose a multi-lingual method of compound recognition and splitting, which uses corpus frequencies, lexical data and optionally linguistic rules. This is a supervised method which requires a small amount of segmented compounds as input. We evaluate the method on two languages that rarely serve as a material for automatic splitting systems: English and Russian. The results obtained are competitive with those of a state-of-the-art corpus-driven approach.
منابع مشابه
Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch
Compounding, the process of combining several simplex words into a complex whole, is a productive process in a wide range of languages. In particular, concatenative compounding, in which the components are “glued” together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro project, which focuses on compounding in the closely r...
متن کاملTowards Unsupervised and Language-independent Compound Splitting using Inflectional Morphological Transformations
In this paper, we address the task of languageindependent, knowledge-lean and unsupervised compound splitting, which is an essential component for many natural language processing tasks such as machine translation. Previous methods on statistical compound splitting either include language-specific knowledge (e.g., linking elements) or rely on parallel data, which results in limited applicabilit...
متن کاملDecompounding query keywords from compounding languages
Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). Furthermore, real-time IR systems (such as search engines) need to cope with noisy data, as user queries are sometimes written quickly and submitted without review. In this paper we apply a state-of-the-art procedure for German decompounding to other compoundi...
متن کاملEvaluation of Microbial Contamination and Physico-Chemical Properties of Compounding Drugs in Yazd Pharmacies
Aims: Any drug product made in the pharmacy, hospital or factory may be infection with microbes. This infection can be originated from raw materials or during manufacture of the product. It is also important to study the physical and chemical properties and stability of compound products. Materials & Method: In this study, a specific sample of a compound drug was ordered to 63 drugstores with ...
متن کاملCompound terms and their constituent elements in information retrieval
Compounds, especially in languages where compounds are formed by concatenation without intervening whitespace between elements, pose challenges to simple text retrieval algorithms. Search queries that include compounds may not retrieve texts where elements of those compounds occur in uncompounded form; search queries that lack compounds will not retrieve texts where the salient elements are bur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014